Source | # of sentences | Average logarithmic rank |
---|---|---|
http://ang.wikipedia.org/wiki/Gifica | 11 | 5.33 |
http://ang.wikipedia.org/wiki/Sarawak_Brego | 12 | 5.34 |
http://ang.wikipedia.org/wiki/Swēoland | 11 | 5.35 |
http://ang.wikipedia.org/wiki/Romulus_Augustulus | 13 | 5.43 |
http://ang.wikipedia.org/wiki/Englisc_folc | 15 | 5.48 |
http://ang.wikipedia.org/wiki/Ēadweard_III_Engla_Cyning | 12 | 5.48 |
http://ang.wikipedia.org/wiki/Ēadweard_Sweartæþeling | 15 | 5.48 |
http://ang.wikipedia.org/wiki/Harold_III_Norrena_Cyning | 22 | 5.49 |
http://ang.wikipedia.org/wiki/Sisavang_Vatthana | 14 | 5.52 |
http://ang.wikipedia.org/wiki/Anna_Ēastengla_Cyning | 12 | 5.54 |
http://ang.wikipedia.org/wiki/Alexis_Tsipras | 13 | 5.56 |
http://ang.wikipedia.org/wiki/Nicaragua | 11 | 5.59 |
http://ang.wikipedia.org/wiki/Oþomanisce_Rīce | 11 | 5.61 |
http://ang.wikipedia.org/wiki/Ēadgar_Æðeling | 22 | 5.61 |
http://ang.wikipedia.org/wiki/Sierra_Leone | 12 | 5.62 |
http://ang.wikipedia.org/wiki/Cartaina | 13 | 5.63 |
http://ang.wikipedia.org/wiki/Godrum | 23 | 5.63 |
http://ang.wikipedia.org/wiki/Lundenceaster | 20 | 5.64 |
http://ang.wikipedia.org/wiki/Hloðwig_IV_Francena_Cyning | 11 | 5.65 |
http://ang.wikipedia.org/wiki/Moretoin | 13 | 5.68 |
http://ang.wikipedia.org/wiki/Grundgesetnes_þāra_Geāndena_Rīca_American | 12 | 5.69 |
http://ang.wikipedia.org/wiki/Thomas_Cranmer | 15 | 5.69 |
http://ang.wikipedia.org/wiki/Paris | 11 | 5.72 |
http://ang.wikipedia.org/wiki/Lobengula | 12 | 5.72 |
http://ang.wikipedia.org/wiki/Rīdungfȳrramm | 38 | 5.72 |
http://ang.wikipedia.org/wiki/Missouri | 14 | 5.73 |
http://ang.wikipedia.org/wiki/Napoleon_I_Francena_Cāsere | 30 | 5.77 |
http://ang.wikipedia.org/wiki/Australisc_rīce | 12 | 5.80 |
http://ang.wikipedia.org/wiki/Searoburg | 13 | 5.80 |
http://ang.wikipedia.org/wiki/Rīcard_and_Mortig | 13 | 5.81 |
Source | # of sentences | Average logarithmic rank |
---|---|---|
http://ang.wikipedia.org/wiki/Īslendisc_sprǣc | 23 | 7.04 |
http://ang.wikipedia.org/wiki/Heafodstol | 12 | 6.62 |
http://ang.wikipedia.org/wiki/World_Trade_Center | 19 | 6.62 |
http://ang.wikipedia.org/wiki/Perfume | 33 | 6.60 |
http://ang.wikipedia.org/wiki/Stēamwægn | 13 | 6.57 |
http://ang.wikipedia.org/wiki/Niwsæland | 65 | 6.53 |
http://ang.wikipedia.org/wiki/Wikispell:Cunnung/05-2010 | 11 | 6.52 |
http://ang.wikipedia.org/wiki/Wikispell:Indryhta/05-2010 | 14 | 6.50 |
http://ang.wikipedia.org/wiki/Wætertyge | 11 | 6.49 |
http://ang.wikipedia.org/wiki/Ēadweard_se_Martyr | 11 | 6.47 |
http://ang.wikipedia.org/wiki/Washington_Freedom | 22 | 6.46 |
http://ang.wikipedia.org/wiki/One_World_Trade_Center | 15 | 6.43 |
http://ang.wikipedia.org/wiki/Meremægden | 49 | 6.40 |
http://ang.wikipedia.org/wiki/JavaScript_weorcsearu | 27 | 6.40 |
http://ang.wikipedia.org/wiki/Harold_Gōdwines_sunu | 11 | 6.39 |
http://ang.wikipedia.org/wiki/Pintel | 27 | 6.34 |
http://ang.wikipedia.org/wiki/Niwenglisc_spræc | 28 | 6.33 |
http://ang.wikipedia.org/wiki/Wīnlendisc_cystel | 11 | 6.30 |
http://ang.wikipedia.org/wiki/Nintendo_Wii_U | 11 | 6.27 |
http://ang.wikipedia.org/wiki/Rædwald_Ēastengla_Cyning | 13 | 6.26 |
http://ang.wikipedia.org/wiki/Normandisca_Forcyme | 16 | 6.26 |
http://ang.wikipedia.org/wiki/Elisabeþ_I_Engla_Cwen | 22 | 6.26 |
http://ang.wikipedia.org/wiki/Pīpwēod | 13 | 6.25 |
http://ang.wikipedia.org/wiki/Orosius | 15 | 6.19 |
http://ang.wikipedia.org/wiki/Dave_Dobbyn | 25 | 6.18 |
http://ang.wikipedia.org/wiki/Rīmagiefung | 16 | 6.15 |
http://ang.wikipedia.org/wiki/Grendel | 14 | 6.15 |
http://ang.wikipedia.org/wiki/Gerinnung | 38 | 6.14 |
http://ang.wikipedia.org/wiki/Sǣcū | 12 | 6.14 |
http://ang.wikipedia.org/wiki/Hocig | 11 | 6.13 |
In this subsection we replace average word length by average logarithmic word rank. The logarithm of the word rank is taken because we want to punish words of high ranks only moderately.
First table:
select source, count(distinct i_s.s_id) as cnt_s, round(avg(log(w.w_id-100)),2) as av from sources so, inv_so i_s, inv_w i, words w where so.so_id=i_s.so_id and i_s.s_id=i.s_id and i.w_id=w.w_id and w.w_id>100 group by source having cnt_s>10 order by av LIMIT 30;
6.4.2.1 Average word length for different sources
6.4.2.3 Sources consisting of many / few words with frequency 1
6.4.2.4 Sources with low / high average word length of rare words